Variable-time reinforcement in multiple and concurrent schedules1
نویسندگان
چکیده
منابع مشابه
Time Variable Reinforcement Learning and Reinforcement Function Design
We introduce the mathematical model for time variable reinforcement learning. The policy, the rewards or reinforcement function and the transition probabilities may depend on the progress of the time t. We prove that under certain conditions slightly changed methods of classical dynamic programming assure finding the optimal policy. For that we deduct the Bellman equation for the time variable ...
متن کاملIs matching compatible with reinforcement maximization on concurrent variable interval variable ratio?
Four pigeons on concurrent variable interval, variable ratio approximated the matching relationship with biases toward the variable interval when time spent responding was the measure of behavior and toward the variable ratio when frequency of pecking was the measure of behavior. The local rates of responding were consistently higher on the variable ratio, even when there was overall preference...
متن کاملA model for residence time in concurrent variable interval performance.
A component-functions model of choice behavior is proposed for performance on interdependent concurrent variable-interval (VI) variable-interval schedules based on the product of two component functions, one that enhances behavior and one that reduces behavior. The model is the solution to the symmetrical pair of differential equations describing behavioral changes with respect to two categorie...
متن کاملEvaluating Concurrent Reinforcement Learners
Assumptions underlying the convergence proofs of reinforcement learning (RL) algorithms like Q-learning are violated when multiple interacting agents adapt their strategies on-line as a result of learning. Empirical investigations in several domains, however, have produced encouraging results. We evaluate the convergence behavior of concurrent reinforcement learning agents using game matrices a...
متن کاملFast Concurrent Reinforcement Learners
When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Experimental Analysis of Behavior
سال: 1972
ISSN: 0022-5002
DOI: 10.1901/jeab.1972.17-59